Nearly Optimal Adaptive Procedure for Piecewise-Stationary Bandit: a Change-Point Detection Approach

نویسندگان

Yang Cao

Zheng Wen

Branislav Kveton

Yao Xie

چکیده

Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions. In this paper, we consider a scenario in which the arms’ reward distributions may change in a piecewise-stationary fashion at unknown time steps. By connecting changedetection techniques with classic UCB algorithms, we motivate and propose a learning algorithm called M-UCB, which can detect and adapt to changes, for the considered scenario. We also establish an O( √ MKT log T ) regret bound for M-UCB, where T is the number of time steps, K is the number of arms, andM is the number of stationary segments. Comparison with the best available lower bound shows that M-UCB is nearly optimal in T up to a logarithmic factor. We also compare M-UCB with state-of-the-art algorithms in a numerical experiment based on a public Yahoo! dataset. In this experiment, MUCB achieves about 50% regret reduction with respect to the best performing state-of-the-art algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Change-Detection based Framework for Piecewise-stationary Multi-Armed Bandit Problem

The multi-armed bandit problem has been extensively studied under the stationary assumption. However in reality, this assumption often does not hold because the distributions of rewards themselves may change over time. In this paper, we propose a change-detection (CD) based framework for multiarmed bandit problems under the piecewise-stationary setting, and study a class of change-detection bas...

متن کامل

Thompson Sampling in Switching Environments with Bayesian Online Change Point Detection

Thompson Sampling has recently been shown to achieve the lower bound on regret in the Bernoulli Multi-Armed Bandit setting. This bandit problem assumes stationary distributions for the rewards. It is often unrealistic to model the real world as a stationary distribution. In this paper we derive and evaluate algorithms using Thompson Sampling for a Switching Multi-Armed Bandit Problem. We propos...

متن کامل

Adaptive Segmentation with Optimal Window Length Scheme using Fractal Dimension and Wavelet Transform

In many signal processing applications, such as EEG analysis, the non-stationary signal is often required to be segmented into small epochs. This is accomplished by drawing the boundaries of signal at time instances where its statistical characteristics, such as amplitude and/or frequency, change. In the proposed method, the original signal is initially decomposed into signals with different fr...

متن کامل

Change Point Estimation of the Stationary State in Auto Regressive Moving Average Models, Using Maximum Likelihood Estimation and Singular Value Decomposition-based Filtering

In this paper, for the first time, the subject of change point estimation has been utilized in the stationary state of auto regressive moving average (ARMA) (1, 1). In the monitoring phase, in case the features of the question pursue a time series, i.e., ARMA(1,1), on the basis of the maximum likelihood technique, an approach will be developed for the estimation of the stationary state’s change...

متن کامل

Change-point Detection for Lévy Processes

Since the work of Page in the 1950s, the problem of detecting an abrupt change in the distribution of stochastic processes has received a great deal of attention. In particular, a deep connection has been established between Lorden’s minimax approach to change-point detection and the widely used CUSUM procedure, first for discrete-time processes, and subsequently for some of their continuous-ti...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1802.03692 شماره

صفحات -

تاریخ انتشار 2018

Nearly Optimal Adaptive Procedure for Piecewise-Stationary Bandit: a Change-Point Detection Approach

نویسندگان

چکیده

منابع مشابه

A Change-Detection based Framework for Piecewise-stationary Multi-Armed Bandit Problem

Thompson Sampling in Switching Environments with Bayesian Online Change Point Detection

Adaptive Segmentation with Optimal Window Length Scheme using Fractal Dimension and Wavelet Transform

Change Point Estimation of the Stationary State in Auto Regressive Moving Average Models, Using Maximum Likelihood Estimation and Singular Value Decomposition-based Filtering

Change-point Detection for Lévy Processes

عنوان ژورنال:

اشتراک گذاری